| Variable | N = 2991 |
|---|---|
| anaemia | |
| No anaemia | 170, 57% 0 |
| anaemia | 129, 43% 0 |
| diabetes | |
| No Diabetes | 174, 58% 0 |
| Diabetes | 125, 42% 0 |
| DEATH_EVENT | |
| 0 | 203, 68% 0 |
| 1 | 96, 32% 0 |
| age | |
| Mean, Median | 61, 60 |
| Range | 40, 95 |
| (IQR) | (51, 70) |
| SD | 12 |
| sex | |
| Female | 105, 35% 0 |
| Male | 194, 65% 0 |
| smoking | |
| 0 | 203, 68% 0 |
| 1 | 96, 32% 0 |
| high_blood_pressure | |
| No HBP | 194, 65% 0 |
| HBP | 105, 35% 0 |
| 1 n, % N missing | |
Survival Analysis: Evaluating the Impact of Comorbidities on Mortality Risk in Heart Failure Patients
1 Introduction
Heart failure (HF) is a rapidly growing public health issue with an estimated prevalence of 64 million people globally (Shahim et al., 2023). It refers to a chronic condition in which the heart muscle does not pump blood as well as it should. The heart cannot supply enough blood to meet the body’s demand, leading to symptoms like shortness of breath, fatigue and weakness, wheezing, swelling in the legs, ankles, and feet, and chest pain. The primary cause of heart failure is a damaged, weakened, or stiff heart, which is influenced by medical conditions like coronary artery disease, heart attack, high blood pressure, heart valve disease, inflammation of the heart muscle (myocarditis), diabetes, and other diseases, in addition to these medical conditions, aging, smoking, alcohol use, and certain medications serve as risk factors for heart failure (Mayo Clinic, 2023).
Given the significant impact of heart failure on global health, it is crucial to study its progression and mortality rates in various populations. A study was conducted in Faisalabad, Pakistan, and focused on estimating death rates due to heart failure. The study included 299 patients diagnosed with heart failure having left ventricular systolic dysfunction. Diagnoses were confirmed through cardiac echo reports or physician notes. The patients were classified as Class III or IV according to the New York Heart Association (NYHA) functional classification, indicating the severity of the condition.
The study assessed various risk factors potentially associated with mortality. While most data were collected from blood reports, two specific risk factors - smoking status and blood pressure - were obtained from physician’s notes. These factors, along with age, gender, serum sodium, serum creatinine, ejection fraction, anemia, platelets, Creatinine Phosphokinase (CPK), and diabetes, were analyzed to understand their impact on heart failure mortality. The study followed these patients for a period ranging from 4 to 285 days, with an average follow-up time of 130 days, allowing for a comprehensive assessment of mortality rates and associated risk factors in this specific population (Ahmad et al., 2017).
Project Goal
The goal of this project is to investigate the association between diabetes, anemia, high blood pressure, and the hazard of death in patients with heart failure. The following scientific objectives will be addressed:
To examine the association between diabetes and the hazard of death, adjusting for age, sex, and smoking status.
To examine the association between anemia and the hazard of death, adjusting for age, sex, and smoking status.
To examine the association between high blood pressure and the hazard of death, adjusting for age, sex, and smoking status.
2 Methods
Data Preparation
The dataset was tidied to ensure completeness and correctness. This involved converting data types, recoding the levels of the categorical variables (e.g., changing sex from ‘1’/‘0’ to ‘Male’/‘Female’), and selecting the variables utilized for the analyses. The tidied dataset consists of 299 cases and eight variables, namely:
Age (years), Smoking status (Yes/No), Sex (Male/Female).
Binary Indicators for Anaemia, Diabetes, and High Blood Pressure.
The death event and corresponding time (days). (Heart Failure Clinical Records, 2020)
Exploratory Data Analysis
Exploratory analysis was conducted to gain insight into the distribution of the variables in the dataset. This involved generating summary statistics table that included measures of central tendency and spread for continuous variables, and frequencies and proportions for categorical variables.
Statistical Analysis
Kaplan-Meier Survival Analysis
Before investigating the association between the hazard of death from heart failure and specific conditions (anemia, diabetes, and high blood pressure), we estimated the survival function. The survival function represents the likelihood that an individual has not yet experienced the event by time t. We estimated:
The survival function for all patients pooled together.
The survival function for patients stratified by diabetes, anemia, and high blood pressure status.
The survival function was calculated using the Kaplan-Meier estimator, a nonparametric method that does not assume a particular shape. The Kaplan-Meier estimate of the survival function was computed using the formula:
\(\hat{S}(t) = \left(1 - \frac{n.\text{event}}{n.\text{risk}}\right) S(t_{\text{prev}})\) where:
n.event is the number of non-censored events occurring at each time point, and
n.risk is the number of patients still at risk of experiencing death.
The survival functions were plotted, and using the log-rank test, we tested the null hypothesis that the group’s survival functions are exact:\(H_{0}: S_{1}(t)=S_{2}(t)\) (Nahhas, 2024)
Cox Proportional Hazards Model
The project aimed to investigate the association between the hazard of death from heart failure and three medical conditions: anemia, diabetes, and high blood pressure. To this end, we fitted three Cox proportional hazard models. The Cox model is a semi-parametric approach that assumes the hazard function (the instantaneous risk of death among those still alive) depends on a set of regression parameters.
We fitted three independent Cox proportional hazard models of the form:
\(h(t)=h_{0}(t)e^{\beta_1X_1+\beta_2X_2+\beta_3X_3 + \beta_4X_4}\)
Which can be expressed as:
\(\log\left(\frac{h_{i}(t)}{h_{0}(t)}\right) = \beta_1 X_{i1} + \beta_2 X_{i2} + \beta_3 X_{i3} + \beta_4 X_{i4}\)
\(h_0(t)\) is the baseline hazard, and it is the hazard for a patient with \(X_{i1}=0, X_{i2}=0, X_{i3}=0, X_{i4}=0\).
\(X_{i1}\) is the predictor of interest (indicator variable for diabetes, anemia, or high blood pressure).
\(X_{i2}, X_{i3}, X_{i4}\) represents the potential confounding variables: Age, Sex, and Smoking status in that order.
The parameters \(\beta_1, \beta_2, \beta_3, \beta_4\) are unknown and estimated by the method of maximum partial likelihood.
\(e^{\beta_1}\) is the parameter of interest and represents the hazard ratio comparing the hazard of death at time t between patients at a specific level of \(X_{i1}\) vs. those at its reference level, holding all other predictors fixed.
To evaluate statistical evidence, the following hypothesis was tested:
\(H_0: \beta_1=0\) (Null hypothesis: There is no association between the hazard of death and the predictor of interest, after adjusting for age, sex, and smoking status of the patient).
\(H_1: \beta_1\ne0\) (Alternative hypothesis: There is an association between the hazard of death and the predictor of interest, after adjusting for age, sex, and smoking status of the patient).
Model Assumptions
The Cox proportional hazards model makes several assumptions, and assessing whether a fitted Cox regression model adequately describes the data is essential. These assumptions include:
Proportional hazards assumption: This implies that the hazard ratio measuring the effect of any predictor is constant over time. This was assessed by statistical tests and graphical diagnostics based on the scaled Schoenfeld residuals.
Linearity assumption: The Cox regression assumes that the continuous predictors have a linear relationship with the outcome’s log hazard relative to the baseline hazard, which was assessed by plotting the Martingale residuals against the continuous variable.
Influential Observations: These observations alter the regression coefficient by a meaningful amount when included in the data. They were examined by visualizing the dfbeta values.
Outliers: These are observations with very large residuals (in either direction). They were checked by visualizing the deviance residuals.
Methodology: AI-Assisted Writing
This report’s text was reviewed and refined using several AI-powered tools: Grammarly for grammar and style checking, and Claude and ChatGPT for general writing advice and suggestions. While these AI assistants were used to enhance clarity and correctness, all core ideas, analyses, and conclusions are the author’s own.
3 Results
Exploratory Data Analysis Findings
Survival Analysis Results
Kaplan-Meier Survival Curves
Cox Proportional Hazards Model Results
Model 1: Diabetes
| Surv(time,DEATH EVENT) | |||
| Predictors | Estimates | CI | p |
| diabetes [Diabetes] | 1.13 | 0.74 – 1.72 | 0.571 |
| age | 1.04 | 1.03 – 1.06 | <0.001 |
| sex [Male] | 0.97 | 0.61 – 1.55 | 0.897 |
| smoking [1] | 1.08 | 0.66 – 1.75 | 0.766 |
| Observations | 299 | ||
| R2 Nagelkerke | 0.079 | ||
Model 2: Anemia
| Surv(time,DEATH EVENT) | |||
| Predictors | Estimates | CI | p |
| anaemia [anaemia] | 1.33 | 0.89 – 2.00 | 0.161 |
| age | 1.04 | 1.03 – 1.06 | <0.001 |
| sex [Male] | 0.97 | 0.61 – 1.55 | 0.890 |
| smoking [1] | 1.09 | 0.67 – 1.77 | 0.736 |
| Observations | 299 | ||
| R2 Nagelkerke | 0.085 | ||
Model 3: High Blood Pressure
| Surv(time,DEATH EVENT) | |||
| Predictors | Estimates | CI | p |
| high blood pressure [HBP] | 1.53 | 1.01 – 2.31 | 0.045 |
| age | 1.04 | 1.03 – 1.06 | <0.001 |
| sex [Male] | 0.99 | 0.62 – 1.58 | 0.970 |
| smoking [1] | 1.09 | 0.67 – 1.77 | 0.734 |
| Observations | 299 | ||
| R2 Nagelkerke | 0.091 | ||
4 Discussion
The primary objective of this study was to investigate the impact of specific comorbidities—namely high blood pressure, anemia, and diabetes—on mortality among patients with heart failure. We utilized a dataset from a study conducted in Faisalabad, Pakistan, which included 299 patients diagnosed with heart failure with left ventricular systolic dysfunction.
Descriptive statistics revealed 96 deaths (32%) among the cohort, with a median follow-up time of 44.5 days for these cases. Patient age ranged from 40 to 95 years (mean = 61, SD = 12). The majority of patients were male (65%), non-smokers (68%), non-anemic (57%), non-diabetic (58%), and without high blood pressure (65%).
We estimated the overall survival function and plotted Kaplan-Meier survival curves for the entire cohort, as well as stratified by diabetes, anemia, and high blood pressure status. Log-rank tests were employed to assess significant differences in survival functions between each pair of groups. Our findings revealed a statistically significant difference in survival probability between patients with and without high blood pressure (χ² = 4.4, df = 1, p = 0.04). Patients without high blood pressure demonstrated higher survival rates than those with high blood pressure. The median survival time could not be computed as the survival function did not reach 0.50.
Three independent multiple Cox proportional hazards models were fitted to investigate the association between the hazard of death from heart failure and the comorbidities of interest. After adjusting for age, sex, and smoking status, our findings revealed:
High blood pressure was significantly positively associated with death (Adjusted Hazard Ratio [AHR] = 1.53; 95% Confidence Interval [CI] = 1.01-2.31; p = 0.05). Patients with high blood pressure had 1.53 times the hazard of death compared to those without high blood pressure.
The models for anemia and diabetes yielded non-significant findings for these predictors of interest.
Patient age was significantly associated with the hazard of death across all models, with older patients demonstrating a greater hazard.
We assessed model assumptions using various diagnostic statistics and plots. The proportional hazards assumption was evaluated by plotting scaled Schoenfeld residuals against time for each covariate and conducting a global test for each model. This assumption was supported for all models. Non-linearity was assessed by plotting Martingale residuals against each model’s continuous covariate (Age), revealing a non-linear relationship. Outliers and influential observations were examined by plotting deviance residuals and dfbetas against the observations stratified by variables. Our findings indicated no influential observations for each model, but outliers were noted.
To address the violation of linearity, we converted the age variable to a categorical variable, “Age_Category,” with two ranges: “(40, 60]” and “(60, 95],” based on the dataset’s median age. We subsequently refitted the Cox proportional hazards models using this variable. Our findings were consistent with the initial models, demonstrating a significantly positive association between high blood pressure and the hazard of death. The age category variable was also significantly associated with the hazard of death for each fitted model. The revised models satisfied the assumptions, although outliers remained present.
The analysis yielded valuable insights into the association between specific conditions and demographic factors with the risk of mortality in patients with left ventricular systolic dysfunction and heart failure. It was evident that age played a significant role, with older individuals facing a higher risk. Remarkably, high blood pressure emerged as the sole significant factor associated with the risk of death. These results highlight the importance of careful management of high blood pressure in heart failure patients. Further studies should explore this relationship in greater depth and investigate any unidentified factors that may have contributed to this outcome.
5 Appendix
Diagnostics Plots
Testing Proportional Hazards Assumption
Model 1
Model 2
Model 3
:::
Checking Influential Observations
Model 1
Model 2
Model 3
:::
Checking Outliers
Model 1
Model 2
Model 3
:::
Testing Non Linearity
Other Statistical results
[1] 44.5
Call:
survdiff(formula = Surv(time, DEATH_EVENT) ~ high_blood_pressure,
data = heart_failure)
N Observed Expected (O-E)^2/E (O-E)^2/V
high_blood_pressure=No HBP 194 57 66.4 1.34 4.41
high_blood_pressure=HBP 105 39 29.6 3.00 4.41
Chisq= 4.4 on 1 degrees of freedom, p= 0.04
Sensitivity Analysis Results
| Characteristic | N = 2991 |
|---|---|
| anaemia | |
| No anaemia | 170 (57%) |
| anaemia | 129 (43%) |
| diabetes | |
| No Diabetes | 174 (58%) |
| Diabetes | 125 (42%) |
| DEATH_EVENT | 96 (32%) |
| age | 60 (51, 70) |
| sex | |
| Female | 105 (35%) |
| Male | 194 (65%) |
| smoking | |
| 0 | 203 (68%) |
| 1 | 96 (32%) |
| high_blood_pressure | |
| No HBP | 194 (65%) |
| HBP | 105 (35%) |
| time | 115 (73, 203) |
| Age_Category | |
| (40, 60] | 162 (54%) |
| (60, 95] | 137 (46%) |
| 1 n (%); Median (IQR) | |
Model 1: Diabetes
| Surv(time,DEATH EVENT) | |||
| Predictors | Estimates | CI | p |
| diabetes [Diabetes] | 0.99 | 0.65 – 1.49 | 0.951 |
| Age Category [>60- 95] | 1.57 | 1.05 – 2.35 | 0.028 |
| sex [Male] | 1.01 | 0.63 – 1.62 | 0.956 |
| smoking [1] | 1.00 | 0.61 – 1.63 | 0.998 |
| Observations | 299 | ||
| R2 Nagelkerke | 0.017 | ||
Model 2: Anemia
| Surv(time,DEATH EVENT) | |||
| Predictors | Estimates | CI | p |
| anaemia [anaemia] | 1.40 | 0.93 – 2.09 | 0.104 |
| Age Category [>60- 95] | 1.57 | 1.05 – 2.35 | 0.028 |
| sex [Male] | 1.02 | 0.63 – 1.62 | 0.950 |
| smoking [1] | 1.03 | 0.63 – 1.68 | 0.898 |
| Observations | 299 | ||
| R2 Nagelkerke | 0.026 | ||
Model 3: High Blood Pressure
| Surv(time,DEATH EVENT) | |||
| Predictors | Estimates | CI | p |
| high blood pressure [HBP] | 1.52 | 1.00 – 2.29 | 0.049 |
| Age Category [>60- 95] | 1.54 | 1.03 – 2.30 | 0.036 |
| sex [Male] | 1.05 | 0.65 – 1.68 | 0.847 |
| smoking [1] | 1.02 | 0.63 – 1.65 | 0.942 |
| Observations | 299 | ||
| R2 Nagelkerke | 0.030 | ||
Diagnostics Plots for Sensitivity Analysis
Testing Proportional Hazards Assumption
Model 1
Model 2
Model 3
Checking Influential Observations
Model 1
Model 2
Model 3
:::
Checking Outliers
Model 1
Model 2
Model 3
:::